
A Sorghum bicolor expression atlas reveals 
dynamic genotype-specific expression profiles for 
vegetative tissues of grain, sweet and bioenergy 
sorghums 

Shakoor et al. 



BioMed Central 



Shakoor et al. BMC Plant Biology 2014, 14:35 
http://www.biomedcentral.com/1471-2229/14/35 



Shakoor et al. BMC Plant Biology 2014, 14:35 
http://www.biomedcentral.com/1471-2229/14/35 



Plant Biology 



RESEARCH ARTICLE Open Access 



A Sorghum bicolor expression atlas reveals 
dynamic genotype-specific expression profiles for 
vegetative tissues of grain, sweet and bioenergy 
sorghums 

Nadia Shakoor 1,2 , Ramesh Nair 1 , Oswald Crasta 1,3 , Geoffrey Morris 2 , Alex Feltus 4 and Stephen Kresovich 2,4 * 
Abstract 

Background: Effective improvement in sorghum crop development necessitates a genomics-based approach to 
identify functional genes and QTLs. Sequenced in 2009, a comprehensive annotation of the sorghum genome and 
the development of functional genomics resources is key to enable the discovery and deployment of regulatory 
and metabolic genes and gene networks for crop improvement. 

Results: This study utilizes the first commercially available whole-transcriptome sorghum microarray (Sorgh-WTa520972F) 
to identify tissue and genotype-specific expression patterns for all identified Sorghum bicolor exons and UTRs. The 
genechip contains 1,026,373 probes covering 149,182 exons (27,577 genes) across the Sorghum bicolor nuclear, 
chloroplast, and mitochondrial genomes. Specific probesets were also included for putative non-coding RNAs that may 
play a role in gene regulation {e.g., microRNAs), and confirmed functional small RNAs in related species (maize and 
sugarcane) were also included in our array design. We generated expression data for 78 samples with a combination of 
four different tissue types (shoot, root, leaf and stem), two dissected stem tissues (pith and rind) and six diverse genotypes, 
which included 6 public sorghum lines (R159, Atlas, Fremont, P1 1 5261 1 , AR2400 and PI455230) representing grain, sweet, 
forage, and high biomass ideotypes. 

Conclusions: Here we present a summary of the microarray dataset, including analysis of tissue-specific gene expression 
profiles and associated expression profiles of relevant metabolic pathways. With an aim to enable identification and 
functional characterization of genes in sorghum, this expression atlas presents a new and valuable resource to the 
research community. 
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Background 

Sorghum [Sorghum bicolor (L.) Moench] is a staple 
cereal crop for millions of people in the marginal, semi- 
arid environments of Africa and South Asia. Its unique 
and advanced ability to grow in regions of low and 
variable rainfall highlight its potential to impact agricul- 
tural productivity in widespread water-limited environ- 
ments [1,2]. Originating and evolving across the diverse 
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environmental landscape of Africa, morphological and 
physiological adaptation strategies has advanced sor- 
ghum as a naturally heat and drought-tolerant warm 
season C 4 grass that is more efficient at utilizing water, 
nitrogen and energy resources with respect to other 
major crops, including maize and wheat [1,3,4]. Occupy- 
ing seven million hectares of farmland, the United States 
is currently the worlds top sorghum producer (8.8 mil- 
lion annual metric tons), followed by India (7.0), Mexico 
(6.9), and Nigeria (4.8) (http://cgiar.org/sorghum). Culti- 
vated in diverse climates and environmental conditions, 
the challenges of increasing performance and yield on 
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marginal lands and cooler climates remains at the fore- 
front of sorghum improvement efforts worldwide [5,6]. 

Sorghum is globally established as an important source 
of food, feed, sugar and fiber, and recent interest in bioe- 
nergy feedstocks also spotlights sorghum as an attractive 
alternative for sustainable biofuel production [4]. Framed 
upon the 2009 sorghum reference genome [7], transla- 
tional genomic resources have been developed that dir- 
ectly impact research in other closely related C 4 feedstock 
grasses, including switchgrass and Miscanthus [8,9]. Com- 
prehensive understanding of the genetic and molecular 
mechanisms that regulate metabolite biosynthesis, trans- 
port and storage in these species is essential for the effi- 
cient development of biofuel feedstocks. 

Global transcriptome profiling further provides a means 
to access gene networks for the discovery of functional con- 
nections between genes, mRNAs and their regulatory pro- 
teins, and complex traits expressed through coordinated 
and dynamic gene networks across different tissues and 
developmental stages [10]. Over the last decade, micro- 
array-based expression profiling has provided a reliable 
high-throughput platform for genome-wide analysis of 
gene expression in many organisms. Microarrays offer 
substantial advantages for functional genomics, as they are 
increasingly cost-effective, provide a comparable accuracy 
of expression profiling to RNA-sequencing, and have been 
shown to provide comprehensive expression data (up 
to 90% of the transcriptome) in a given tissue [11]. Well- 
established microarray data analysis tools are also available 
for querying, visualizing and analyzing the genomes and 
predicted genes [12,13], as well as for analyzing the tran- 
scriptome profiling data and integrating with other public 
datasets [14-17]. 

To provide insight into the sorghum transcriptome, 
we generated a record of gene expression in a set of 
seven tissues and six diverse sorghum genotypes. The 
choice of samples reflects our aim to develop and enrich 
the current sorghum transcriptome literature. Previous 
studies have predominantly focused on reproductive tis- 
sues, and the majority of these reports do not represent 
the complete sorghum transcriptome. Several of these 
studies have also been limited to the reference genome 
(BTx623) or Keller, a recently resequenced sweet sor- 
ghum variety [18-22]. 

Comparable whole plant transcriptome maps are avail- 
able for a number of other model species, including Ara- 
bidopsis thaliana [23], maize (Zea mays) [24], barley 
(Hordeum vulgare) [25], rice (Oryza sativa) [26,27], and 
soybean (Glycine max) [28]. These recent transcriptome 
surveys were constructed with only one genotype or 
line/accession for their respective species of interest, 
whereas the present study aims to highlight the practi- 
cal importance of examining expression profiles across 
diverse tissue types, developmental stages, as well as 



genotypes in order to accurately target genes and meta- 
bolic pathways for the efficient development of improved 
feedstocks. 

Fundamental understanding of sorghum genomics is 
necessary for improving sorghum for agronomic and 
compositional traits. Specifically, genotypes with high 
biomass and increased levels of fermentable stem sugars 
are ideal for developing feedstocks for the biofuel 
industry. We developed this genomic resource, the 
whole-transcriptome array as well as the vegetative tran- 
scriptome in diverse genotypes and tissues, in order to 
facilitate the characterization of molecular networks and 
regulatory mechanisms governing important metabolic 
pathways including, but not limited to, cell wall biosyn- 
thesis for lignocellulosic biomass as well as synthesis, 
translocation, and storage of fermentable photosynthates 
for energy content. The relevance of our dataset is dem- 
onstrated by genotype and tissue-specific expression of 
the phenylpropanoid and lignin biosynthetic pathway 
genes. 

Intended as readily available public resource for func- 
tional gene characterization, the transcriptome data pre- 
sented here is available through NCBI's Gene Expression 
Omnibus (GEO) under accession number GSE49879, 
and the Sorghum Genome Array is available through 
Affymetr ix (http : / / affymetrix. com) . 

Results and discussion 

Generation and quality assessment of data 

A whole-transcriptome exon array for Sorghum bicolor was 
custom-designed by Chromatin, Inc. (http://chromatininc. 
com) and Affymetrix: Sorgh-WTa520972F. This genechip 
contains 1,026,373 probes covering 149,182 exons (27,577 
genes) across the Sorghum bicolor genome (10 chromo- 
somes), chloroplast and mitochondria. The sequences 
used to construct the probesets included all identified 
Sorghum bicolor exons from the Sbil assembly (http:// 
www.phytozome.net). Multiple probes were chosen for 
each exon, with a minimum of one probe per exon and 25 
probes per gene. In addition to standard Affymetrix con- 
trols, positive controls in the microarray design included 
probes for constitutively expressed Sorghum bicolor genes 
(actin, ubiquitin and eIF4al). Probes for intronic regions of 
actin and ubiquitin were also utilized to determine back- 
ground expression levels. 

To study the sorghum transcriptome and build a gene ex- 
pression atlas, we collected 78 diverse samples from various 
developmental stages and tissue types (Additional file 1). In 
order to broadly capture sorghum genetic diversity, we in- 
cluded genotypes representing three major ideotypes, in- 
cluding grain, sweet, and bioenergy sorghums. Our study 
includes R159, an elite grain sorghum characterized by the 
valuable agronomic traits of uniform growth and disease re- 
sistance [29]. Grain sorghum is cultivated primarily for its 
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high starch content, applications in human/animal health 
and nutrition, and as biofuel feedstock for ethanol produc- 
tion [5]. We also included two sweet sorghums, Fremont 
and Atlas, that produce increased biomass and accumulate 
high levels of fermentable carbohydrates in the stem. Add- 
itionally, Fremont is drought resistant and flowers early, 
while Atlas is less susceptible to lodging (due to a stiff stalk 
phenotype) and flowers later [30]. We also selected three 
bioenergy or high biomass lines, PI455230, PI152611, and 
AR2400 that produce increased levels of cellulosic material 
and are photoperiod sensitive, which allows the plant to 
produce higher amounts of vegetative matter under long 
day conditions (Additional file 2). PI 1526 11 is specifically a 
forage line, a fast-growing, highly digestible grass utilized 
for livestock feed [5,29]. 

The primary goal of this study was to obtain relevant 
and applicable data for the research community develop- 
ing sorghum as a global feedstock; this research interest 
guided our sample selection towards vegetative tissues, 
with a strong bias for stem tissues. A comprehensive 



trancriptomic profile of sorghum inflorescence and leaf 
data was recently made available to the community [19]. 
We compared the leaf RNA sequencing dataset with the 
present leaf dataset to demonstrate and confirm that our 
microarray analysis approach towards transcriptome 
profiling was appropriate. The Spearman correlation of 
the transcriptome across technologies is 0.61 (Additional 
file 3), which is consistent with several studies compar- 
ing RNA-seq and microarray methods for genome-wide 
transcriptome profiling [31-33]. The present comparison 
corroborates these studies and demonstrates that the 
microarray platform for expression profiling correlates 
well with current sequencing methods. With a common 
goal of crop improvement, complementary datasets such 
as these generate a core of information that can be ex- 
plored for the functional characterization of genes and 
genetic pathways. 

We assessed data quality for hybridization by compar- 
ing normalized signals of all probe sets between bio- 
logical replicates using Pearsons correlation analysis. 
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Figure 1 Pearson's correlation matrix of the whole dataset. Pair-wise Pearson correlation coefficients were calculated from the gene 
expression values of the whole transcriptome (27,577 genes) in all 78 samples. The hierarchical clusters were obtained based on Euclidian distance 
and are indicated by the color bar on the top side of the figure. The color scale indicates the degree of correlation. 
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The biological replicates were highly correlated, with an 
average Pearson s correlation coefficient of 0.99 (Additional 
file 4). The highly reproducible results of the replicate data 
further validate the quality of the microarray platform and 
present dataset Previous studies have consistently estab- 
lished strong correlations between qRT-PCR data and 
microarray data processed using robust multi-array ana- 
lysis (RMA) [34,35]. However, we also tested a small subset 
of these genes via qRT-PCR to validate the array-generated 
expression data and expression patterns across multiple 
tissue types (Additional file 5 A and 5B). 

To further assure data quality, we also examined the 
general expression patterns of well-characterized genes 
that have been highlighted for tissue-specific expression 
in previous studies. In microarray experiments with 
RNA isolated from shoot tips, we observed high expres- 
sion levels for homologs of SPATULA, a shoot tip tran- 
scription factor that is strongly expressed in shoot tips 



and young leaf primordia [36]. Similarly, the sorghum 
homolog for TIP2-3, a root-specific aquaporin gene [37], 
was also expressed at higher levels in our study using 
root-isolated RNA (Additional file 6). 

Global gene expression patterns 

We detected the expression of 19,354 genes in at least 
one of the 78 samples, representing 70.2% of all genes 
on the array (27,577 genes). The number of expressed 
transcripts detected in the various tissues ranged be- 
tween 10,850 and 11,587 (representing 56 to 60% of all 
expressed genes on the array). Expressed genes were de- 
termined following established methods [24], and with a 
conservative and arbitrary expression threshold cutoff of 
320 (five times the mean normalized signal from intronic 
gene probes used as controls), we found that 15.4% of 
genes on the array were detected in all tissues (4256/ 
27,577) (Additional file 7). Gene ontology (GO) 
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Figure 2 Cluster dendrogram of the whole dataset (78 samples). The hierarchical clusters of organs were grouped based on Euclidian 
distance. The 5 clusters are indicated by the color bar on the bottom side of the figure. 
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annotation analysis of these constitutively expressed 
genes reveals that most are involved in basic biological 
processes including development, protein synthesis/ 
modification, and signal transduction (Additional file 8). 
Similar to published work in maize, expression of consti- 
tutive genes varied among the samples, with the coeffi- 
cient of variation (CV) ranging from 5% to 129%. With a 
CV of 10.4%, we identified a ubiquitin-conjugating en- 
zyme, Sb09g023560, as one of the most stably expressed 
genes (Additional file 9). This class of genes was also 
identified in the maize atlas as the most stably expressed 
among variable tissues [24]. 

A diverse range of plant tissues was sampled in this 
study; however, 29.8% of the probesets were not detected 



above our designated expression threshold level. Several 
plausible explanations can account for this incomplete 
expression coverage, including gene expression from 
specific tissues and/or developmental stages not in- 
cluded in this study, false positive gene models, and 
levels of expression below detection threshold limits. 
Further, the arrays were developed utilizing the BTx623 
reference sequence and do not capture polymorphisms, 
copy number variation and presence-absence variation 
across all the sampled genotypes. 

Transcriptome-based classification of sorghum tissues 

A Pearson s distance correlation matrix was constructed 
to compare and evaluate the transcriptome data from 
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Figure 3 Functional category distribution of tissue-specific transcripts. Expression levels of select Gene Ontology categories across tissue 
types. The Sbi 1 .4 version of the sorghum annotation allowed for the identification of -85% of expressed genes across all tissue types. The 
transcripts were manually verified and grouped into 7 functional categories based on Plant GO slim classifications. 
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each sample (Figure 1). This data shows strong correla- 
tions among and within the individual tissue types. The 
associated dendrogram reveals clustering according to 
tissue type as well as genotype, highlighting the signifi- 
cance of genotype-specific expression in this study 
(Figure 2). Utilizing GO categories, functional analysis of 
the identified gene sets revealed enrichment of known 
tissue-specific biological processes. For example, the 
leaf and shoot-associated gene sets were enriched for 
photosynthetic genes relative to the roots, as expected 
(Additional file 8). We found that components of protein 
synthesis were overexpressed in the seedling roots and 
shoots, whereas genes involved in metabolism were 
over-represented in the shoot tip and stem tissues 
(Figure 3). These data identify core sets of genes associated 
with various biological processes and are clear targets for 
future study aimed to definitively characterize their func- 
tions in specific tissues. 

Differential transcriptomes of developmentally distinct 
vegetative tissues were also apparent from the principal 
component analysis (PCA) (Figure 4). The PCA reveals 
clustering of functionally related tissue types, and the 
first two principal components (PC) of this analysis ex- 
plain 68% of the variance among samples (PCI = 48%, 
PC2 = 20%). Apical meristematic zones of the roots and 
shoot tips clustered together and weakly clustered with 
leaves, shoots and stem tissues. The large group of 
stem tissues (46 samples) including internode, pith, and 
rind strongly clustered together and weakly with the 
remaining tissues. These results are consistent with pre- 
vious studies in maize and P. halli crop models, that 



show core similarities among stem-associated tissues 
and subsequent divergence of root and leaf samples 
[24,38]. 

Interestingly, three out of 46 stem samples clustered near 
the group of meristematic tissues (roots and shoot tips). All 
three of these outlier' samples were collected at the top 
internode, 61 days after planting (DAP) in three of the six 
sampled genotypes (PI455230, PI152611, and AR2400). At 
70DAP, the stem samples from same genotypes clustered 
with the other stem samples. These lines are characterized 
as high biomass genotypes, whereas the remaining three ge- 
notypes can be characterized as either grain or sweet lines 
(R159, Atlas, and Fremont). The PCA indicates that at 
61DAP, the patterns of gene expression in the stem of the 
high biomass lines are more related to meristematic re- 
gions, or regions of active growth. While it is possible 
that these three stem samples were collected too close to 
the meristematic shoot tip region, further study may indi- 
cate that the differential transcriptome in the stems of 
these lines capture a transition zone of gene expression in 
which sorghum commits to post-reproductive pathways of 
sugar production and grain fill versus continued biomass 
production. This result further demonstrates the im- 
portance of examining genotype, tissue type, as well as 
temporal expression patterns when targeting transcrip- 
tional programs of interest. 

Tissue and genotype-specific patterns of gene expression 

To identify tissue-specific genes, we created genotype- 
specific datasets for PI152611, Fremont, and AR2400, each 
representing one of three major classes of sorghum: forage, 
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Figure 4 Classes sharing similar expression patterns. Principal component analysis was applied to 78 tissue samples, based on expression of 
29,065 probe sets (27,577 genes, 654 controls and 834 small RNA probe sets). Each symbol represents a single sample. Tissue types are indicated 
by color and shape of symbol. 
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sweet, and high biomass types respectively. Excluding repli- 
cate tissues from the same major organ, we identified genes 
exclusively expressed in the leaf, shoot, root, shoot tip and 
stem (Figure 5). The leaf and meristematic shoot tips 
expressed the greatest number of tissue-specific transcripts 
across all three genotypes, whereas the seedling shoots 
expressed the fewest number of tissue-specific genes. Of 
particular interest in this dataset is the extent of variation 
observed across genotypes. For example, in stems, over 800 
stem-specific genes are identified in representative exam- 
ples of sweet and high biomass sorghum. Over 500 stem- 
specific genes are detected in forage sorghum; however, 
only 103 stem-specific genes are common among all three 
sorghum types. This lack of shared tissue-specific genes 
across genotypes is observed in all major tissue types. We 
also carried out this analysis for the small RNAs included 
on the array (Additional file 10). Similar to gene expression, 
we observed both tissue and genotype-specific expression 
of the small RNAs (Additional file 11). For purposes 
of functional crop improvement, these results highlight 



the significance of intra-species variation in sorghum and 
the importance of selecting the appropriate genotype for 
targeted changes to gene expression via transgenic and 
breeding approaches. 

To illustrate the expression dynamics among tissues, 
we also calculated the relative gene expression levels 
(Z-scores) of each of the major tissues (Figure 6). Con- 
sistent with previous studies, tissues with a relatively 
higher number of tissue-specific genes (e.g. leaf, root, 
shoot tip, pith) had a wide distribution of genes deviat- 
ing from their mean expression. Stem-associated tissues 
had similar expression profiles and gene expression was 
closer to the overall average across tissue types [24,38]. 

We next attempted to determine whether functional 
gene classes were over-represented in specific genotypes. 
GO analysis did not reveal statistical differences in the 
enrichment of GO slim terms using agriGO (Fishers 
exact test and the Yekutieli (false-discovery rate under 
dependency) multi-test adjustment method) [39]. How- 
ever, this can partially be attributed to the incomplete 
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Figure 5 Number of tissue-specific genes in across sorghum ideotypes. AR2400: biomass sorghum; Fremont: sweet sorghum; P1 1 5261 1: 
forage sorghum; Common: number of genes in common among all three ideotypes. 
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annotation of the sorghum genome, as well as stage and 
tissue-specific expression not captured in our sample 
collection. 

To identify genotype-specific expression patterns, we 
examined the expression of several known sugar metab- 
olizing enzymes and sucrose transporters in sorghum 
with the hypothesis that differential expression of these 
genes would be observable across genotypes (Additional 
file 12). Differential expression between sweet and grain 
sorghum has recently been shown [21,40], and our re- 
sults further validate this observation, with the majority 
of sugar-related genes showing differential expression 
among tissues and genotypes. For example, sweet and 
high biomass varieties showed consistently higher ex- 
pression of SPS2 and SPS5, sugar phosphate enzymes 
thought to play significant roles in sucrose biosynthesis, 
compared to grain varieties (Figure 7). A comprehensive 
gene list and more detailed expression analysis of sugar 
related genes across genotypes may provide insight into 
the mechanisms governing trade-offs in sorghum grain 
yield and stem sugar content. 

We further analyzed tissue-specific transcripts to iden- 
tify shared and specifically expressed genes in multiple 
tissues (Figures 8 and 9). To avoid variation in gene ex- 
pression due to genotypic differences, we chose samples 
from the genotype Atlas for this analysis. We identified 



587, 489, and 698 genes that are specifically expressed in 
leaf, stem and root and 232 and 688 unique genes that 
are expressed in shoot and shoot tips, respectively 
(Figure 8). We also identified 960 genes that are specific- 
ally expressed in stem rind (predominantly lignified 
sclerenchymatous cells) as compared to 928 genes that 
are specifically expressed in stem pith (predominantly 
non-lignified parenchymatous cells; Figure 9). This data- 
set provides a unique opportunity to discover target sets 
of genes in core sorghum varieties that may be useful for 
modulating gene expression in a tissue-dependent man- 
ner. For example, these rind and pith-specific genes can 
be studied as potential candidate genes for biomass con- 
tent and targets for compositional modification of biofuel 
feedstocks. Further, identification of promoter elements 
and corresponding DNA-binding regulatory proteins that 
regulate tissue-specific expression of genes could be iden- 
tified from these data. As a direct application of this study, 
we are currently analyzing the promoter regions of candi- 
date genes that are differentially expressed in the rind ver- 
sus pith region of stem tissues. 

Tissue-specific expression of genes involved in the 
phenylpropanoid-monolignol pathway 

To exemplify the functional utility of this data, we 
highlighted the expression data of 10 key enzymes 
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Figure 7 Hierarchical clustering of samples based on expression of sucrose metabolizing enzymes and sucrose transporter genes. 

Color bar key: Blue: sweet sorghum; red: grain sorghum; green: high biomass sorghum. Outlined in blue, the expression of sucrose phosphate 
synthase genes, SPS2 and SPS5, is consistently lower in grain types sweet and high biomass lines. Sugar metabolism gene list is appropriated 
from current literature [40]. 



associated with the phenylpropanoid-monolignol biosyn- 
thesis pathway (Additional file 13). Currently, one of the 
primary strategies for bioenergy feedstock improvement is 
through lignin modification. Alterations in lignin content 
and composition aim to improve the digestibility of forage 
and saccharification efficiency of lignocellulosic biofuels 
[41,42]. Thus, modifying the expression of genes in the lig- 
nin biosynthesis pathway is an attractive approach to 
achieving this goal. 

Annotated in several databases, the majority of known 
and putative genes and homologs were analyzed 
for: phenylalanine ammonia-lyase (PAL, 9 sequences), 
coumaroyl shikimate 3'-hydroxylase (C3'H, 1), ferulate 
5-hydroxylase (F5H, 3), cinnamate 4-hydroxylase (C4H, 
3), 4-coumarate:CoA ligase (4CL, 5), cinnamoyl CoA 
reductases (CCR, 3), hydroxycinnamoyl CoA:shikimate 
hydroxycinnamoyl transferase (HCT, 1), caffeoyl-CoA 3- 
O-methyltransferase (CCoAMOT, 6), caffeic acid 3-0- 
methyltransferase (COMT, 1), and cinnamyl alcohol 
dehydrogenase (CAD, 1). Similar to previous studies in 
maize and switchgrass, the highest expression of these 



genes was found in the roots and stems [8,43]. Further, 
hierarchical clustering reveals that the expression of lignin 
biosynthesis genes varies with developmental stage, as well 
tissue type and genotype (Figure 10). Distinct expression 
signatures of gene homologs as well as clustering of 
above-ground vegetative tissues according to develop- 
mental stage has precedence in maize and, in general, 
most of the lignin genes showed organ-specific ex- 
pression patterns consistent with studies in related 
species [24,38]. 

Conclusions 

Comprehensive transcriptome profiling provides a global 
overview of gene networks and allows for the discovery of 
functional connections between genes, mRNAs and their 
regulatory proteins. In the present study, we constructed a 
gene expression atlas covering an array of tissues, develop- 
mental stages and genotypes using the first commercially 
available sorghum microarray (Sorgh-WTa520972F). We 
observed tissue and genotype-specific expression patterns 
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of relevant metabolic pathways that highlight the signifi- 
cance of intra-species variation in sorghum. 

Developed as a new resource for crop breeding and 
genomic discovery, Sorgh-WTa520972F is produced by 
Affymetrix and is available to the public research com- 
munity. We are currently utilizing this microarray to 
identify differential gene expression related to key meta- 
bolic processes (e.g., starch/lignin biosynthesis) for the 
identification of regulatory regions. Additional avenues 
for future study with this array are wide-ranging and can 
include gene expression profiling during abiotic/biotic 
stress, plant infection and disease establishment to in- 
vestigate genetic mechanisms and applications to plant 
breeding and crop improvement. Detailed expression 
analysis of small RNAs included in the array design 
may also reveal key insights in diverse biological pro- 
cesses, including RNA-guided gene regulation. Sorgh- 
WTa520972F can also be utilized in quantitative trait 
locus (QTL) mapping and validation methods (e.g., iden- 
tify differentially expressed genes from 'tolerant' versus 
sensitive' varieties). Minimal costs associated with mic- 



roarray analysis allow for the generation of high- 
throughput expression profiles or combinations of 
profiles of elite breeding lines for accelerated crop- 
breeding efforts. Applications of this resource can target 
numerous agronomic traits in sorghum as well as provide 
insight in closely related grasses (e.g., sugarcane, switch- 
grass, Miscanthus x giganteus) for improved feedstock 
development. 

Methods 

Tissue collection 

To study the sorghum transcriptome and build the 
present gene expression atlas, we collected 78 samples 
from various developmental stages and tissue types 
(Additional file 1). Six diverse sorghum genotypes 
were grown in Chromatins greenhouse and field sites 
(Champaign, IL). These six genotypes were chosen to rep- 
resent ideotypes of sorghum cultivation, including sweet, 
grain and high biomass sorghum varieties. Greenhouse 
grown seedling shoot and root samples were collected at 
10DAP, which is roughly five days after plant emergence. 
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Figure 9 Number of shared and specific expression profiles of genes expressed in multiple stem tissues (Atlas genotype). 



Whole leaf and meristematic shoot tip samples were col- 
lected at 38DAP. This time-point captures the active 
growth phase of vegetative structures, including leaves, 
shoots and tillers. The stem tissue samples were collected 
at two time points: 61 and 70DAP. At 61DAP, the stem is 
fully formed in both flowering and non-flowering types. In 
flowering types, the head is also fully formed, and the 
period between 61 and 70DAP is a stage of active metab- 
olism, capturing the transition between flowering (61DAP) 
and active grain filling (70DAP) [45]. The stem tissue was 
further dissected into the pith and the rind. As a bioenergy 
crop, the majority of fermentable sugar available in 
sorghum is present in the pith. The majority of lignin, 
however, is found in the rind [46]. Two tissue types 
(shoot and root) were represented by two biological 
replicates. 

Microarray design 

A whole-transcriptome exon array for Sorghum bicolor: 
Sorgh-WTa520972F was designed and utilized for the 
present expression study. The array contains 1,026,373 
probes covering 149,182 exons (27,577 genes) across the 
Sorghum bicolor nuclear, chloroplast and mitochondrial 
genome. The sequences used to construct the probesets 
included all identified Sorghum bicolor exons from the 
Sbil assembly and Sbil.4 annotation (http://phytozome. 
net). We also added sequences for putative non-coding 
RNAs in Sorghum bicolor that may play a role in gene 
regulation (e.g., rRNAs, tRNAs, snoRNAs and microRNAs). 



Confirmed functional small RNAs in closely related species 
(maize, sugarcane) were also included in our array design 
(http://bioinformatics.cau.edu.cn/PMRD, http://www.ncma. 
org/frnadb) (Additional file 10). 

RNA Isolation and hybridization 

Total RNA from all tissue types was extracted using a 
NucleoSpin RNA Plant Kit (Maxherey-Nagel, Germany). 
RNA integrity, as indicated by the detection of discrete 
ribosomal subunits, was verified electrophoretically. The 
RNA quality and quantity was further validated with a 
NanoDrop spectrophotometer (NanoDrop Technologies, 
Wilmington, DE). Prior to hybridization, the total RNA 
profile was also analyzed with Agilent 2100 Bioanalyzer 
(Agilent technologies, Waldbronn, Germany). Synthesis 
of cDNA, probe labeling and hybridization was per- 
formed by Precision Biomarker (Precision Biomarker Re- 
sources, Inc. Evanston, Illinois) 

Data extraction and evaluation of gene expression 

Background correction and normalization were per- 
formed using a robust multi-chip average (RMA) algo- 
rithm in the Bioconductor Affy package [13]. Present 
calls for expressed genes were determined following 
established methods [24]. In brief, an expressed gene 
was identified by a RMA-normalized linear expression 
of >/= 320 in at least one of the 78 samples. The expres- 
sion cut-off was five times the mean RMA-normalized 
signal from 576 negative-control oligos selected from 
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Figure 10 Hierarchical clustering of tissues based on expression of phenylpropanoid-monolignol biosynthesis pathway genes. 

*Constitutively expressed genes: Ubiquitin: Sb10g027470; EIF4A1: Sb04g003390. Color bar key: Blue: sweet sorghum; red: grain sorghum; green: high 
biomass sorghum. The color scale indicates the relative gene expression (Z-scores). Red, yellow, and green represent high, medium, and low levels of 
gene expression, respectively. The phenylpropanoid-monolignol pathway and enzyme nomenclature is appropriated from current literature [44]. 



the intronic regions of known constitutive genes (e.g., 
actin, ubiquitin, and eIF4al). A mean signal intensity of 
64 was determined for the negative control oligos ana- 
lyzed across all 78 slides. Constitutively expressed genes 
were identified by a RMA-normalized linear expression 
value of >/= 320 in all 78 samples. 

Principal component analysis, hierarchical clustering 
and z-scores 

To study the biological relatedness and identify expression 
trends among the samples, we utilized the cmdscale func- 
tion and then plotted using R. We used RMA-normalized 
log 2 normalized expression values in the PCA analysis. 
Hierarchical clustering was performed using RMA-nor- 
malized log 2 normalized expression values and clustered 
using Pearsons correlation analysis. The Z scores were cal- 
culated as follows: Z = (X-X mean )/SD, where X is the average 
expression of a given gene in a tissue, and X mean and SD are 
the mean expression and standard deviation respectively of 
that gene across all the selected tissues. 

GO Slim enrichment analysis 

We evaluated enrichment of GO slim terms of biological 
process category (http://geneontology.org/GO.slims) in 
agriGO (http://bioinfo.cau.edu.cn/agriGO/) by Fishers 



exact test (p-value <0.05) and the Yekutieli (false-discovery 
rate under dependency) multi-test adjustment method [39]. 

qRT-PCR 

The relative mRNA expression was measured using Peltier 
Thermal Cycler PTC-200 PCR machine (MJ Research, 
Waltham, MA, USA) and the Superscript III Platinum 
SYBR Green One-Step qRT- PCR kit (Invitrogen, Carlsbad, 
CA). Three independent reverse transcription reactions 
were performed for each RNA sample, and qRT-PCR was 
carried out under the following conditions: 100 nanograms 
of each RNA sample was reverse transcribed at 60°C for 3 
minutes, and reverse transcription was followed by initial 
activation at 95°C for 5 minutes, and 40 amplification cycles 
at 95°C for 15 s and 50°C for 30s. Results were analysed 
using MJ Opticon Monitor 3.1.32 software, and relative ex- 
pression of mRNA was calculated by the comparative Ct 
method (2 [A][A]Ct ) [47]. Gene expression values across tis- 
sue types were normalized to ubiquitin expression. 

Availability of supporting data 

The transcriptome dataset supporting the results of 
this article is available through NCBI's Gene Expression 
Omnibus (GEO) under accession number GSE49879, 
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and the Sorghum Genome Array is available through 
Affymetrix (http://affymetrixxom). 
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